fix: add resource limits, PodDisruptionBudgets, and backup error handling by Flegma · Pull Request #427 · 5stackgg/5stack-panel

Flegma · 2026-04-08T12:21:52Z

Summary

Add resource requests/limits to all deployments and statefulsets (API, Web, Redis, Hasura, MinIO, Typesense, TimescaleDB)
Create PodDisruptionBudgets for API, Hasura, and TimescaleDB to protect against involuntary disruptions
Fix backup CronJob: remove || true from apk add, add pg_dump output validation, add S3 upload error checking

Addresses #415 and #416

Test plan

Verify all pods start within resource limits
Verify PDBs are created and prevent draining below minAvailable
Verify backup CronJob fails properly on pg_dump or S3 upload errors
Confirm kustomize build succeeds with new PDB resources

…ling

Change PDBs from minAvailable:1 to maxUnavailable:1 so single-replica workloads don't block node drains and cluster upgrades. Bump API and Hasura memory limits from 512Mi to 1Gi and CPU from 500m to 1000m to handle NestJS+BullMQ+WebSocket and Hasura subscription load.

lukepolo · 2026-04-09T17:28:23Z

+              memory: "256Mi"
+              cpu: "250m"
+            limits:
+              memory: "1Gi"


for larger installs no. do max 4gb

Bumped API memory limit to 4Gi in ffa968e.

lukepolo · 2026-04-09T17:29:34Z

+              memory: "256Mi"
+              cpu: "250m"
+            limits:
+              memory: "1Gi"


way too little , in a production instance i have 4 GB. lets lmit to 4gb

Bumped Hasura memory limit to 4Gi in ffa968e.

lukepolo · 2026-04-09T17:31:02Z

also not a fan of CPU limits. we are not concerned enough here to starve these . memory limits on thse si mostly too low.

Keep CPU requests for scheduling but drop limits so containers aren't throttled under load.

Flegma · 2026-04-17T10:55:01Z

Removed CPU limits across all services in 6417828. CPU requests are still set for scheduling so pods land on adequately-sized nodes, but nothing throttles them now.

Memory limits: API/Hasura at 4Gi as you specified. Left the others (Redis 256Mi, MinIO 512Mi, TimescaleDB 1Gi, Typesense 512Mi, Web 256Mi) at the initial values — if any of those are too low, let me know what your prod values are and I'll bump.

Flegma added 2 commits April 8, 2026 14:21

fix: add resource limits, PodDisruptionBudgets, and backup error hand…

a39837c

…ling

lukepolo reviewed Apr 9, 2026

View reviewed changes

Flegma added 2 commits April 10, 2026 11:42

fix: increase API and Hasura memory limits to 4Gi

ffa968e

fix: remove CPU limits per review feedback

6417828

Keep CPU requests for scheduling but drop limits so containers aren't throttled under load.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: add resource limits, PodDisruptionBudgets, and backup error handling#427

fix: add resource limits, PodDisruptionBudgets, and backup error handling#427
Flegma wants to merge 4 commits intomainfrom
audit/415-prod-readiness

Flegma commented Apr 8, 2026

Uh oh!

lukepolo Apr 9, 2026

Uh oh!

Flegma Apr 10, 2026

Uh oh!

lukepolo Apr 9, 2026

Uh oh!

Flegma Apr 10, 2026

Uh oh!

lukepolo commented Apr 9, 2026

Uh oh!

Flegma commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Flegma commented Apr 8, 2026

Summary

Test plan

Uh oh!

lukepolo Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Flegma Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

lukepolo Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Flegma Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

lukepolo commented Apr 9, 2026

Uh oh!

Flegma commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants